Goto

Collaborating Authors

 Andaman Sea


Deep Unsupervised Anomaly Detection in Brain Imaging: Large-Scale Benchmarking and Bias Analysis

Frotscher, Alexander, Baumgartner, Christian F., Wolfers, Thomas

arXiv.org Artificial Intelligence

Deep unsupervised anomaly detection in brain magnetic resonance imaging offers a promising route to identify pathological deviations without requiring lesion-specific annotations. Yet, fragmented evaluations, heterogeneous datasets, and inconsistent metrics have hindered progress toward clinical translation. Here, we present a large-scale, multi-center benchmark of deep unsupervised anomaly detection for brain imaging. The training cohort comprised 2,976 T1 and 2,972 T2-weighted scans from healthy individuals across six scanners, with ages ranging from 6 to 89 years. Validation used 92 scans to tune hyperparameters and estimate unbiased thresholds. Testing encompassed 2,221 T1w and 1,262 T2w scans spanning healthy datasets and diverse clinical cohorts. Across all algorithms, the Dice-based segmentation performance varied between 0.03 and 0.65, indicating substantial variability. To assess robustness, we systematically evaluated the impact of different scanners, lesion types and sizes, as well as demographics (age, sex). Reconstruction-based methods, particularly diffusion-inspired approaches, achieved the strongest lesion segmentation performance, while feature-based methods showed greater robustness under distributional shifts. However, systematic biases, such as scanner-related effects, were observed for the majority of algorithms, including that small and low-contrast lesions were missed more often, and that false positives varied with age and sex. Increasing healthy training data yields only modest gains, underscoring that current unsupervised anomaly detection frameworks are limited algorithmically rather than by data availability. Our benchmark establishes a transparent foundation for future research and highlights priorities for clinical translation, including image native pretraining, principled deviation measures, fairness-aware modeling, and robust domain adaptation.


Model Parallelism With Subnetwork Data Parallelism

Singh, Vaibhav, Khalid, Zafir, Oyallon, Edouard, Belilovsky, Eugene

arXiv.org Artificial Intelligence

Pre-training large neural networks at scale imposes heavy memory demands on accelerators and often requires costly communication. We introduce Subnetwork Data Parallelism (SDP), a distributed training framework that partitions a model into structured subnetworks trained across workers without exchanging activations. We study two complementary masking regimes: backward masking, which applies sparsity only in the backward step to retain unbiased gradients, and forward masking, which also removes parameters in the forward pass to deliver stronger efficiency gains while providing additional regularization. We further explore two subnetwork construction strategies: neuron level and block level, applied across both CNNs and transformers. In experiments spanning CNNs and transformers on CIFAR and ImageNet, as well as LLM pre-training on FineWeb, SDP reduces per-device memory usage by 30%-75% while maintaining or improving performance. Notably, in FLOP-matched settings, forward masking can sometimes achieve better performance.


Personalized and Demand-Based Education Concept: Practical Tools for Control Engineers

Varga, Balint, Fischer, Lars, Kovacs, Levente

arXiv.org Artificial Intelligence

This paper presents a personalized lecture concept using educational blocks and its demonstrative application in a new university lecture. Higher education faces daily challenges: deep and specialized knowledge is available from everywhere and accessible to almost everyone. University lecturers of specialized master courses confront the problem that their lectures are either too boring or too complex for the attending students. Additionally, curricula are changing more rapidly than they have in the past 10-30 years. The German education system comprises different educational forms, with universities providing less practical content. Consequently, many university students do not obtain the practical skills they should ideally gain through university lectures. Therefore, in this work, a new lecture concept is proposed based on the extension of the just-in-time teaching paradigm: Personalized and Demand-Based Education. This concept includes: 1) an initial assessment of students' backgrounds, 2) selecting the appropriate educational blocks, and 3) collecting ongoing feedback during the semester. The feedback was gathered via Pingo, ensuring anonymity for the students. Our concept was exemplarily tested in the new lecture "Practical Tools for Control Engineers" at the Karlsruhe Institute of Technology. The initial results indicate that our proposed concept could be beneficial in addressing the current challenges in higher education.


Seeing Beyond Frames: Zero-Shot Pedestrian Intention Prediction with Raw Temporal Video and Multimodal Cues

Zambare, Pallavi, Thanikella, Venkata Nikhil, Liu, Ying

arXiv.org Artificial Intelligence

Pedestrian intention prediction is essential for autonomous driving in complex urban environments. Conventional approaches depend on supervised learning over frame sequences and require extensive retraining to adapt to new scenarios. Here, we introduce BF-PIP (Beyond Frames Pedestrian Intention Prediction), a zero-shot approach built upon Gemini 2.5 Pro. It infers crossing intentions directly from short, continuous video clips enriched with structured JAAD metadata. In contrast to GPT-4V based methods that operate on discrete frames, BF-PIP processes uninterrupted temporal clips. It also incorporates bounding-box annotations and ego-vehicle speed via specialized multimodal prompts. Without any additional training, BF-PIP achieves 73% prediction accuracy, outperforming a GPT-4V baseline by 18 %. These findings illustrate that combining temporal video inputs with contextual cues enhances spatiotemporal perception and improves intent inference under ambiguous conditions. This approach paves the way for agile, retraining-free perception module in intelligent transportation system.


Evaluating the Goal-Directedness of Large Language Models

Everitt, Tom, Garbacea, Cristina, Bellot, Alexis, Richens, Jonathan, Papadatos, Henry, Campos, Siméon, Shah, Rohin

arXiv.org Artificial Intelligence

To what extent do LLMs use their capabilities towards their given goal? We take this as a measure of their goal-directedness. We evaluate goal-directedness on tasks that require information gathering, cognitive effort, and plan execution, where we use subtasks to infer each model's relevant capabilities. Our evaluations of LLMs from Google DeepMind, OpenAI, and Anthropic show that goal-directedness is relatively consistent across tasks, differs from task performance, and is only moderately sensitive to motivational prompts. Notably, most models are not fully goal-directed. We hope our goal-directedness evaluations will enable better monitoring of LLM progress, and enable more deliberate design choices of agentic properties in LLMs.


Computational Synthesis of Wearable Robot Mechanisms: Application to Hip-Joint Mechanisms

Kang, Seok Won, Ryu, Jegyeong, Kim, Suh In, Kim, Youngsoo, Kim, Yoon Young

arXiv.org Artificial Intelligence

Since wearable linkage mechanisms could control the moment transmission from actuator(s) to wearers, they can help ensure that even low-cost wearable systems provide advanced functionality tailored to users' needs. For example, if a hip mechanism transforms an input torque into a spatially-varying moment, a wearer can get effective assistance both in the sagittal and frontal planes during walking, even with an affordable single-actuator system. However, due to the combinatorial nature of the linkage mechanism design space, the topologies of such nonlinear-moment-generating mechanisms are challenging to determine, even with significant computational resources and numerical data. Furthermore, on-premise production development and interactive design are nearly impossible in conventional synthesis approaches. Here, we propose an innovative autonomous computational approach for synthesizing such wearable robot mechanisms, eliminating the need for exhaustive searches or numerous data sets. Our method transforms the synthesis problem into a gradient-based optimization problem with sophisticated objective and constraint functions while ensuring the desired degree of freedom, range of motion, and force transmission characteristics. To generate arbitrary mechanism topologies and dimensions, we employed a unified ground model. By applying the proposed method for the design of hip joint mechanisms, the topologies and dimensions of non-series-type hip joint mechanisms were obtained. Biomechanical simulations validated its multi-moment assistance capability, and its wearability was verified via prototype fabrication. The proposed design strategy can open a new way to design various wearable robot mechanisms, such as shoulders, knees, and ankles.


Habits of Mind: Reusing Action Sequences for Efficient Planning

Éltető, Noémi, Dayan, Peter

arXiv.org Artificial Intelligence

When we exercise sequences of actions, their execution becomes more fluent and precise. Here, we consider the possibility that exercised action sequences can also be used to make planning faster and more accurate by focusing expansion of the search tree on paths that have been frequently used in the past, and by reducing deep planning problems to shallow ones via multi-step jumps in the tree. To capture such sequences, we use a flexible Bayesian action chunking mechanism which finds and exploits statistically reliable structure at different scales. This gives rise to shorter or longer routines that can be embedded into a Monte-Carlo tree search planner. We show the benefits of this scheme using a physical construction task patterned after tangrams.


Block-wise Training of Residual Networks via the Minimizing Movement Scheme

Karkar, Skander, Ayed, Ibrahim, de Bézenac, Emmanuel, Gallinari, Patrick

arXiv.org Artificial Intelligence

End-to-end backpropagation has a few shortcomings: it requires loading the entire model during training, which can be impossible in constrained settings, and suffers from three locking problems (forward locking, update locking and backward locking), which prohibit training the layers in parallel. Solving layer-wise optimization problems can address these problems and has been used in on-device training of neural networks. We develop a layer-wise training method, particularly welladapted to ResNets, inspired by the minimizing movement scheme for gradient flows in distribution space. The method amounts to a kinetic energy regularization of each block that makes the blocks optimal transport maps and endows them with regularity. It works by alleviating the stagnation problem observed in layer-wise training, whereby greedily-trained early layers overfit and deeper layers stop increasing test accuracy after a certain depth. We show on classification tasks that the test accuracy of block-wise trained ResNets is improved when using our method, whether the blocks are trained sequentially or in parallel.


Open Question Answering over Tables and Text

Chen, Wenhu, Chang, Ming-Wei, Schlinger, Eva, Wang, William, Cohen, William W.

arXiv.org Artificial Intelligence

In open question answering (QA), the answer to a question is produced by retrieving and then analyzing documents that might contain answers to the question. Most open QA systems have considered only retrieving information from unstructured text. Here we consider for the first time open QA over both tabular and textual data and present a new large-scale dataset Open Table-Text Question Answering (OTT-QA) to evaluate performance on this task. Most questions in OTT-QA require multi-hop inference across tabular data and unstructured text, and the evidence required to answer a question can be distributed in different ways over these two types of input, making evidence retrieval challenging---our baseline model using an iterative retriever and BERT-based reader achieves an exact match score less than 10%. We then propose two novel techniques to address the challenge of retrieving and aggregating evidence for OTT-QA. The first technique is to use "early fusion" to group multiple highly relevant tabular and textual units into a fused block, which provides more context for the retriever to search for. The second technique is to use a cross-block reader to model the cross-dependency between multiple retrieved evidences with global-local sparse attention. Combining these two techniques improves the score significantly, to above 27%.


Hierarchical nucleation in deep neural networks

Doimo, Diego, Glielmo, Aldo, Ansuini, Alessio, Laio, Alessandro

arXiv.org Machine Learning

Deep convolutional networks (DCNs) learn meaningful representations where data that share the same abstract characteristics are positioned closer and closer. Understanding these representations and how they are generated is of unquestioned practical and theoretical interest. In this work we study the evolution of the probability density of the ImageNet dataset across the hidden layers in some state-of-the-art DCNs. We find that the initial layers generate a unimodal probability density getting rid of any structure irrelevant for classification. In subsequent layers density peaks arise in a hierarchical fashion that mirrors the semantic hierarchy of the concepts. Density peaks corresponding to single categories appear only close to the output and via a very sharp transition which resembles the nucleation process of a heterogeneous liquid. This process leaves a footprint in the probability density of the output layer where the topography of the peaks allows reconstructing the semantic relationships of the categories.